Method for coding and decoding a digital video, and related coding and decoding devices

ABSTRACT

A method is described for generating a video stream by starting from a plurality of sequences of 2D and/or 3D video frames, wherein a video stream generator composes into a container video frame video frames coming from N different sources (S1, S2, S3, SN) and generates a single output video stream of container video frames which is coded by an encoder, wherein said encoder enters into the output video stream a signalling adapted to indicate the structure of the container video frames. A corresponding method for regenerating the video stream is also described.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Reissue application of U.S. patent applicationSer. No. 14/435,408, filed Apr. 13, 2015, now U.S. Pat. No. 9,961,324,which issued on May 1, 2018, which is a nationalization of PCTApplication No. PCT/IB2013/059349, filed Oct. 14, 2013, and claimspriority to Italian Application No. TO2012A0901, filed Oct. 15, 2012,which are incorporated herein by specific reference.

BACKGROUND OF THE INVENTION 1. Field of the Invention

The present invention relates to a method for coding and decoding adigital video, in particular a method for coding a video stream intoindependent partitions, and to a corresponding method for independentlydecoding one or more partitions making up the video stream.

The present invention also relates to a device for coding a video streaminto independent partitions and to a device for independently decodingone or more of said partitions.

2. Present State of the Art

The coding and distribution of independent video streams representingdifferent views of the same event or of a mosaic of multimedia services(multiview video—Free-to-View Video) have long been known. Distributingsuch multiview videos to users typically requires coding a number ofindependent video streams matching the number of generated views.

A coding and decoding method of this kind is described for example, indocument “ISO/IEC 13818-1: 2000 (E)—Information technology—Genericcoding of moving pictures and associated audio information: Systems”, orin document “ISO/IEC 14496-10 Information technology—Coding ofaudio-visual objects Part 10: Advanced Video Coding” and in thecorresponding document “ITU-T H.264—Advanced video coding for genericaudiovisual services”, hereafter referred to as H.264/AVC specification.The coding methods currently in use have several drawbacks, such as: thenecessity of using a number of video encoders equal to the number ofvideo components to be distributed; the mutual difficult synchronizationamong the video streams being distributed and between the video streamsand the corresponding audio streams; the increased band required fortransporting the video streams, due to the need for replicating similarsignalling elements required for decoding each independent stream. Onthe other hand, the corresponding decoding methods require the use ofmultiple decoders for decoding and displaying two or more views beingtransmitted, leading to higher complexity and cost of the userterminals' architecture.

It is also known that a single video stream can be used for distributingmultiple independent views, as is the case, for example, of theso-called “mosaic” services, wherein the single frame is constituted byn frames extracted from independent videos and composed into one image,or by the two component videos of a 3D stereoscopic pair composed into asingle frame (the so-called “Frame Packing Arrangement” or “framecompatible format”). Such composite videos are typically compressed byusing any one of the available compression techniques, such as, forexample, MPEG-2, H.264/AVC, HEVC. Such compression techniques provide notools allowing a specification-compliant decoder to independently decodeone or more of the component video streams. Methods have been developedwhich allow a 2D decoder to extract from the decoded video only one ofthe two component views of the stereoscopic pair, but these methods relyon the use of a suitable signalling that allows the decoder, once theentire container frame has been decoded, to cut and display a frame areacontaining only one of the two views.

It is currently impossible to code the video in such a way as to enablea decoder (upon user selection or due to limited computational orstorage resources) to decode only a chosen subset of the whole frame.For example, it is not possible to code a video containing one of theabove-mentioned Frame Packing Arrangements in a manner such that a 2Ddecoder, which is not interested in both images making up thestereoscopic pair, can decode and display only the region correspondingto one of the two views (e.g., the left one).

This implies wasting computational and energetic resources. It should benoted that this problem is especially felt in the field of mobileterminals, where any undue utilization of computational resources candrastically shorten the battery life.

Furthermore, a decoder may be used in a device such as a set-top box ora smart gateway, to which one or more displays, not necessarily havinghomogeneous characteristics, can be connected. Let us consider, forexample, the case of a smart gateway receiving a coded video stream froma distribution network (e.g., an IP network or a broadcasting network)or reading the stream from a storage device. To said smart gateway aplurality of displays can be connected, through cables and/or wirelessconnections, which may have different characteristics (e.g., HD displayor tablet). In such a case, the decoder should be able to adapt thedecoded video to the characteristics of the display(s) to be served: ifjust one display with lower resolution than the decoded video isconnected to the decoder, the latter should be able to decode only thatpart of the video which is most relevant for the terminal involved.

Besides, the current techniques only allow to automatically identify oneof the component video streams (as in the above stereoscopic pairexample), so that it is impossible to expressly indicate to the decoderthe presence of the additional one or more component video streams. A“default” choice is thus imposed on the decoder with less resources, andthe presence of alternative contents cannot be indicated.

Moreover, the possibility of coding a single video stream, besidesallowing to scale the utilization of computational resources during thedecoding process, also allows to code a single video stream in order toserve, according to different service models, terminals characterized bydifferent availability in terms of storage and computational resources.For example, it is conceivable to code the composition of 4 HD videos(1920×1080 pixel) as a single 4k (3840×2160 pixel) video stream: of sucha video, a decoder with limited computational resources might decode asubset containing just one of the HD components; alternatively, a morepowerful decoder might decode the entire 4K video and, for example,display the whole mosaic of contents.

SUMMARY OF THE INVENTION

It is one object of the present invention to define a coding method thatallows coding into a single container video stream one or more differentcomponent video streams, so that at least one of the latter can bedecoded independently of the others.

It is another object of the present invention to specify a decodingmethod which allows one or more component video streams to beindependently decoded from a single container video stream through theuse of a single decoder.

It is a further object of the present invention to provide an encoderwhich codes a container video stream made up of multiple component videostreams, so as to allow one or more component video streams to beindependently decoded.

It is yet another object of the present invention to provide a decoderthat independently decodes at least one of a plurality of componentvideo streams coded as a single container video stream.

BRIEF DESCRIPTION OF THE DRAWINGS

These and further aspects of the present invention will become moreapparent from the following description, which will illustrate someembodiments thereof with reference to the annexed drawings, wherein:

FIG. 1 shows an image to be coded partitioned into groups of macroblocks(“slices”) in accordance with the H.264/AVC specification;

FIG. 2 shows an image to be coded partitioned into “tiles” in accordancewith the HEVC specification;

FIG. 3 shows an example of composition of four independent 2D videostreams into a single video stream;

FIG. 4 shows the composition of two independent stereoscopic videostreams, in the form of 2D video pairs, into a single video stream;

FIG. 5 shows a process for selectively decoding one of the two imagesthat constitute the stereoscopic pair, coded as a single video stream;

FIG. 6 shows a composition of a stereoscopic video stream and theassociated depth maps into a single container video stream;

FIG. 7 shows a composition of a 2D video stream and a stereoscopic videostream into a single container video stream;

FIG. 8 is a block diagram of the process for composing and coding thevideo stream generated by composition of n independent video streams;

FIG. 9 shows an example of a method for decoding a video streamgenerated by the coding apparatus described in FIG. 8 ;

FIG. 10 shows a further method for decoding a video stream generated bya coding apparatus according to FIG. 8 ;

FIGS. 11 and 11b is show the composition of two views of a stereoscopicvideo stream into a single container video stream;

FIG. 12 is a table that describes a structure of a signalling to beentered into a coded video stream;

FIG. 13 is a table containing possible values of a parameter of thestructure of FIG. 12 ;

FIGS. 14a-14d show a table with modifications to the syntax of the PPSof the HEVC standard, which are required for entering the signalling ofFIG. 12 ;

FIGS. 15a-15f show a table with modifications to the syntax of the SPSof the HEVC standard, which are required for entering the signalling ofFIG. 12 .

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The existing video coding standards, as well as those currently underdefinition, offer the possibility of partitioning the images thatconstitute digital video streams for the purpose of optimizing thecoding and decoding processes. As shown in FIG. 1 , the H.264/AVCspecification allows creating groups of macroblocks, wherein the imagesto be coded are subdivided into different types of groups, calledslices, which are then coded independently of one another. For example,as shown in FIG. 1 in regard to the subdivision called “Type 2”, themacroblocks can be grouped into slices having an arbitrary shape, so asto allow the quality of the coded video to be selectively varied as afunction of the position of any “regions of interest”.

Instead, FIG. 2 shows a new type of image subdivision, called “tile”,which has been introduced into the specification of the new ITU/ISO/IECHEVC (High Efficiency Video Coding) standard. This type of subdivision,based on the slice structure already existing in the H.264/AVCspecification, has been introduced in order to allow parallelization ofthe video stream coding and decoding processes: the increasing spreadand lower costs of parallel graphic processors (the so-called GPU's,Graphics Processing Units), which are now available even on mobileterminals such as telephones and PC tablets, have promoted theintroduction of parallelization support tools which allow image formatsto be brought to very high resolutions even on terminals typicallyhaving limited computational resources.

The HEVC specification has defined tiles in such a way as to allow theimages that constitute the video stream to be segmented into regions andto make the decoding thereof mutually independent. The decoding process,however, even when parallelized, is still carried out for the entireimage only, and the segments cannot be used independently of oneanother.

As aforementioned in the above paragraphs, it would be useful to be ableto partition the video stream in such a way that different terminals candecide, automatically or upon instructions received from the user, whichparts of the video should be decoded and sent to the display forvisualization.

FIGS. 3, 4, 6 and 7 illustrate different utilization scenarios wherethis kind of partitioning might prove useful.

FIG. 3 shows a container video stream which, for example, may be in the4K (3840×2160 pixel) format and may contain four independent HD(1920×1080 pixel) videos. A user equipped with a 4K decoder may decodeand display the entire video, while a user equipped with a less powerfuldecoder may limit the decoding to a single HD stream at a time.

FIG. 4 shows the transportation, as a single container video stream, oftwo stereoscopic video streams (in the form of two independent Left andRight video pairs), e.g., representing two different stereoscopic viewsof the same event, from which the user can choose the preferred viewwithout necessarily having to decode the whole frame (with obviousimplications in terms of energy consumption).

FIG. 5 shows the composition of a stereoscopic video and the associateddepth maps into a single video stream. In this case, a decoder of astereoscopic television set may decode only the part relating to the twoimages of the stereoscopic pair, located in the upper half of the image;the lower part will thus not be decoded. Instead, a decoder of aauto-stereoscopic television set using the well-known 2D+Z technique(construction of synthetic views from a single image plus the associateddepth map) might, for example, decode only the left half of the image,whereas the decoder of a more sophisticated auto-stereoscopic decodermay use both views and both depth maps to synthesize the intermediateviews.

FIG. 7 shows the composition of a dual-resolution 2D video (e.g.,intended for a display in 21:9 format), located in the upper half of theimage, and the corresponding stereoscopic view in side-by-side format inthe lower region.

The tile structure described in the HEVC specification is not sufficientto allow a decoder to properly recognize and decode the contenttransported by the container video. This problem can be solved byentering a suitable level of signalling describing which content isbeing transported in each one of the independently decodable regions andhow to proceed in order to properly decode and display it.

At least two different scenarios can be foreseen. In the first one, itis necessary to indicate the association between the single contents andat least one of the tiles into which the image has been disassembled,and its possible reassembly into a coherent video stream (for example,as shown in FIG. 11 , a stereoscopic video stream might be subdividedinto two tiles and, while a 2D decoder must be informed about thepossibility of decoding one single tile, a 3D decoder might not adoptany specific strategy and decode the entire stream). In the secondscenario, instead, it is indicated the association between the singlecontents and each one of the tiles into which the image has beendisassembled, and its possible reassembly into a coherent video stream(for example, a stereoscopic video stream may be subdivided into twotiles and, while a 2D decoder must be informed about the possibility ofdecoding one single tile, a 3D decoder must be informed about thenecessity of decoding the entire stream).

The proposed solution provides for entering a descriptor whichindicates, for at least one of the tiles, one or more specificcharacteristics: for example, it must be possible to signal if thecontent is a 2D one or, in the case of a stereoscopic content, the typeof frame packing arrangement thereof. Furthermore, it is desirable toindicate any “relationships” (joint decoding and/or display) betweentiles; the view identifier (to be used, for example, in the case ofmultiview contents) and a message stating whether the view in questionis the right view or the left view of a stereoscopic pair, or a depthmap. By way of example, the solution is illustrated as pseudo code inthe table of FIG. 12 , which describes the structure of the signallingto be entered into the coded video stream by using the data structuresalready employed in the H.264/AVC and HEVC specifications. It isnonetheless possible to adopt analogous signalling structures allowingthe content of one or more tiles to be described in such a way as toallow a decoder to decode them appropriately.

Frame_packing_arrangement_type is an index that might correspond, forexample, to the values commonly used in the MPEG2, H.264/AVC or SMPTEspecifications, which catalogue the currently known and usedstereoscopic video formats.

Tile_content_relationship_bitmask is a bitmask that univocallydescribes, for each tile, its association with the other tiles intowhich the coded video stream has been subdivided.

Content_interpretation_type provides the information necessary forinterpreting the content of each tile. An example is specified in thetable of FIG. 13 .

With reference to the above case, wherein a stereoscopic video is codedas two tiles, in order to ensure the decoding of just one view by a 2Ddecoder the following information will be associated with the tile 0:

-   -   frame_packing_arrangement_type[0]=3    -   tile_content_relationship_bitmask[0]=11    -   view_id[0]=0    -   content_interpretation_type[0]=2

It should be noted that this type of signalling might be used togetherwith or instead of other tools, such as, for example, the croppingrectangle. The cropping rectangle technique, according to which it ismandatory to crop the part of the decoded frame inside a rectanglesignalled by means of suitable metadata, is already commonly used formaking “3D compatible” a stereoscopic video stream in the form of one ofthe frame packing arrangements that require the stereoscopic pair to beentered into a single frame. FIG. 11b is illustrates, for example, aframe containing the so-called “side-by-side” frame packing arrangement,wherein only the left view (the gray one in figure) is contained in thecropping rectangle. Without tile partitioning, a 2D decoder shoulddecode the whole frame, then apply the cropping and discard the rightview (the white one in FIG. 11b is). By using the method of theinvention, it is instead possible to code and signal the two views asseparate tiles, thereby allowing a 2D decoder to decode just the areacontained in the cropping rectangle.

Assuming, for example, that the video stream has been divided into fourtiles, as shown in FIG. 4 , the relationship among the tiles would bedescribed by the following values:

-   -   frame_packing_arrangement_type[0]=3    -   frame_packing_arrangement_type[1]=3    -   frame_packing_arrangement_type[2]=3    -   frame_packing_arrangement_type[3]=3    -   tile_content_relationship_bitmask[0]=1100    -   tile_content_relationship_bitmask[1]=1100    -   tile_content_relationship_bitmask[2]=0011    -   tile_content_relationship_bitmask[3]=0011    -   view_id[0]=0    -   view_id[1]=0    -   view_id[2]=1    -   view_id[3]=1    -   content_interpretation_type[0]=2    -   content_interpretation_type[1]=1    -   content_interpretation_type[2]=2    -   content_interpretation_type[3]=1

This signalling indicates to the decoder that tiles 0 and 1 belong tothe same 3D video content (tile_content_relationship_bitmask=1100) inside-by-side (frame_packing_arrangement_type=3). The value oftile_content_relationship_bitmask allows the decoder to know that thetwo views (which belong to the same stereoscopic pair because tileview_id=0 for both tiles) are contained in different tiles (and hence,in this case, at full resolution). Content_interpretation_type allows tounderstand that tile 0 corresponds to the left view, while tile 1corresponds to the right view.

The same considerations apply to tiles 1 and 2.

The arrangement of FIG. 6 , instead, is described by the followingsignalling:

-   -   frame_packing_arrangement_type[0]=3    -   frame_packing_arrangement_type[1]=3    -   frame_packing_arrangement_type[2]=6    -   frame_packing_arrangement_type[3]=6    -   tile_content_relationship_bitmask[0]=1111    -   tile_content_relationship_bitmask[1]=1111    -   tile_content_relationship_bitmask[2]=1010    -   tile_content_relationship_bitmask[3]=0101    -   view_id[0]=1    -   view_id[1]=1    -   content_interpretation_type[0]=2    -   content_interpretation_type[1]=1    -   content_interpretation_type[2]=5    -   content_interpretation_type[3]=5

Unlike FIG. 4 , tile_content_relationship_bitmask is 1111 for tiles 0and 1. This means that there is a relationship among all tiles. Inparticular, tiles 2 and 3 are 2D contents(frame_packing_arrangement_type=6) containing a depth map(content_interpretation_type=5) respectively associated with tile 0(tile_content_relationship_bitmask=1010) and with tile 1(tile_content_relationship_bitmask=0101)

In the syntax of the HEVC specification, this type of signalling couldbe easily coded as a SEI (Supplemental Enhancement Information) message:application information which, without altering the basic coding anddecoding mechanisms, allows the construction of additional functionsconcerning not only the decoding, but also the next visualizationprocess. As an alternative, the same signalling could be entered intothe Picture Parameter Set (PPS), a syntax element that containsinformation necessary for decoding a dataset corresponding to a frame.The table of FIGS. 14a-14d includes, highlighted in bold, themodifications, in the form of pseudo code, that need to be made to thesyntax of the PPS of the HEVC standard in order to enter theabove-mentioned signalling

A further generalization might provide for entering the signalling intothe Sequence Parameter Set (SPS): a syntax element that containsinformation necessary for decoding a dataset corresponding to aconsecutive sequence of frames. The table of FIGS. 15a-15f includes,highlighted in bold, the modifications, in the form of pseudo code, thatneed to be made to the syntax of the SPS of HEVC in order to enter theabove-mentioned signalling, wherein multiservice_flag is a variable thatinforms about the presence of multiple services within each tile andnum_tile is the number of tiles within one frame.

FIG. 5 illustrates the selective tile decoding process. The video streamcontains a pair of stereoscopic views, coded into two separate tiles.

The latter are described by the same signalling used for representingthe content of FIG. 4 (in this case, however, the total number of tilesis 2).

FIG. 8 is a block diagram of an apparatus or a group of apparatuses thatcan implement the coding technique of the present invention. N videocontents S₁-S_(N) are inputted to a “source composer”. The “sourcecomposer” may be a separate component or may be integrated as an inputstage of a suitable encoder. The source composer composes the containervideo stream that transports the N component video streams, and thenoutputs it towards an encoder. The source composer may optionally addthe signalling required for describing to the encoder the format of thecomponent video streams and their positions within the container videostream.

An encoder receives the container video stream, constructs the tiles insuch a way as to map them onto the structure of the single componentvideo streams, generates the signalling describing the tiles, thestructure of the component video streams and their relationships, andcompresses the container video stream. If the “source composer” does notautomatically generate the signalling that describes the component videostreams, the encoder can be programmed manually by the operator. Thecompressed video stream outputted by the encoder can then be decoded indifferent ways, i.e., by selecting independent parts depending on thefunctional characteristics and/or computational resources of the decoderand/or of the display it is connected to. The audio of each componentvideo stream can be transported in accordance with the specifications ofthe System Layer part adopted for transportation.

A 2D decoder analyzes the bitstream, finds the signalling of the twotiles containing the two views, and decides to decode a single tile,displaying only one image compatible with a 2D display. A 3D decoder,instead, will decode both tiles and will proceed with stereoscopicvisualization on a 3D display.

Similarly, FIG. 9 shows a decoder which, when connected to the display,negotiates the characteristics (e.g., the resolution) of the video to bedisplayed and decides accordingly, in an autonomous manner, which partof the video stream is to be decoded. This decision might also bedictated by the manual intervention of a user: for example, in the eventthat the video being transmitted is a stereoscopic video coded into twotiles, and assuming that the user, although equipped with a 3Dtelevision set, wants nevertheless to watch that content in 2D format(such a decision may be manifested by pressing a specific remote controlkey), the decoder may adopt a different decoding strategy than the oneit would have adopted automatically while negotiating the best displayformat with the television set.

FIG. 10 shows, instead, the case wherein the decoder is located inside agateway that receives the coded stream and must serve heterogeneousterminals, characterized by the possibility of supporting differentformats of the video content (e.g., some devices may have the ability ofdisplaying stereoscopic contents, while, at the same time, other devicesmight only have a 2D display). The gateway automatically negotiates withor receives configuration instructions from each device, and thendecodes one or more parts of the input content in such a way as to adaptthem to the characteristics of each requesting device.

Therefore, the present invention relates to a method for generating avideo stream by starting from a plurality of sequences of 2D and/or 3Dvideo frames, wherein a video stream generator composes into a containervideo frame video frames coming from N different sources S₁, S₂, S₃,S_(N). Subsequently, an encoder codes the single output video stream ofcontainer video frames by entering into it a signalling adapted toindicate the structure of the container video frames.

The invention also relates to a method for regenerating a video streamcomprising a sequence of container frames, each one comprising aplurality of 2D and/or 3D video frames coming from N different sourcesS₁, S₂, S₃, S_(N). A decoder reads a signalling adapted to indicate thestructure of the container video frames, and regenerates a plurality ofvideo streams by extracting at least one or a subset of the plurality ofvideo frames by decoding only those portions of the container videoframes which comprise those video frames of the plurality of 2D and/or3D video frames of the video streams which have been selected fordisplay.

The invention claimed is:
 1. A method for generating a digital videostream in a video stream generator comprising a video stream receiverunit and a video encoder, wherein the video stream generator generates acontainer video stream containing a plurality of independently encodedregions, the method comprising: receiving by said the video streamreceiver unit three or more component video streams from a plurality ofvideo sources; mapping said by the video encoder the three or morecomponent video streams to three or more independently decodableregions; entering by said the video encoder a signalling signalindicating a presence of corresponding three or more independentlydecodable regions,; entering by said the video encoder a signallingindicating an association between each of said the three or morecomponent video streams and each of said the three or more independentlydecodable regions, whereby any of said the three or more componentregions component video streams can be associated with any of said thethree or more independently decodable regions in an independent way,;and outputting a digital video stream comprising said the signal, thesignalling and the container video stream.
 2. The method according toclaim 1, further comprising entering by said wherein the signallingentered by the video encoder enters a descriptor into said the digitalvideo stream indicating a type of content of said the three or morecomponent video streams.
 3. The method according to claim 1, whereineach one of the three or more independently decodable regions is codedby said the video encoder as a tile.
 4. The method according to claim 1,wherein a coding technique employed by said the video encoder is employsthe coding technique H.264/AVC or HEVC.
 5. The method according to claim1 2, wherein the signalling entered by said the video encoder into thedigital video stream indicating that indicates the association betweeneach of the three or more component regions video streams and each ofthe three or more independently decodable regions and that includes thedescriptor indicating a the type of content of the three or morecomponent regions is video streams is an SEI message.
 6. The methodaccording to claim 1 2, wherein the signalling indicating that indicatesthe association between each of the three or more component regionsvideo streams and each of the three or more independently decodableregions and that includes the descriptor indicating a the type ofcontent of the three or more component regions is video streams isentered by said the video encoder into an SPS signalling or into a PPSsignalling.
 7. The method according to claim 1, wherein the signallingentered by said the video encoder into the digital video streamindicating the association between each of the three or more componentregions video streams and each of the three or more independentlydecodable regions includes a bitmask.
 8. The method according to claim1, wherein the three or more component regions of the digital videostream video streams represent one or more independent video streams. 9.The method according to claim 8, wherein the one three or more componentvideo streams include one or more of the following formats: one or morestereoscopic video pairs; video streams and depth maps; one or morevideo streams in the frame packing arrangement format; mosaic ofindependent videos.
 10. The method according to claim 2, wherein thedescriptor comprises one or more metadata describing: Frame packingarrangement; Content interpretation type; and View ID.
 11. A method fordecoding an encoded digital video stream including a three or moreindependent component regions component video streams in a video decodercomprising a signalling decoder and a video data decoder, the methodcomprising: reading by said the signalling decoder a signal indicating apresence of three or more independently decodable regions; reading bysaid the signalling decoder a signalling comprised in said the digitalvideo stream indicating an association between each of said the three ormore independent component regions component video streams and each ofsaid the three or more independently decodable regions, wherein said thethree or more independent component regions component video streams areoriginated by a plurality of video sources and wherein any of said thethree or more component regions component video streams can beassociated with any of said the three or more independently decodableregions in an independent way; reading by said the signaling decoder adescriptor comprised in said the digital video stream indicating a typeof content of each one of the three or more independently decodableregions; selecting for decoding a set one or more of the three or moreindependently decodable regions indicated by said the signalling or bysaid the descriptor, and decoding said the selected set one or more ofindependently decodable regions by said the video decoder and outputtingthe a decoded video stream obtained by said the video data decoder fordisplaying.
 12. The method according to claim 11, wherein said the videodecoder selects the one or more of the three or more independentlydecodable regions based on an evaluation of its own computationalresources of the video decoder.
 13. The method according to claim 11,wherein the selected one or more of the three or more independentlydecodable regions are made available for display on a single display.14. The method according to claim 11, wherein the selected one or moreof the three or more independently decodable regions are made availablefor display on multiple heterogeneous devices.
 15. The method accordingto claim 11, wherein a selection of a set the selected one or more ofthe three or more independently decodable regions to be decoded isdetermined based on an automatic negotiation with a display deviceassociated to said the video decoder and configured to display the videodata decoded by said the video decoder.
 16. The method according toclaim 11, wherein a selection of a set the selected one or more of thethree or more independently decodable regions to be decoded isdetermined based on a manual selection of the a display format on adisplay device associated to said the video decoder by a user performedby means of a remote control device associated to said the videodecoding device decoder or to said the display device.
 17. A decodingdevice for decoding a digital video stream including a three or moreindependent component regions component video streams and configured toread a signal indicating a presence of three or more independentlydecodable regions, the decoding device comprising: a signaling decoderconfigured to read a signalling comprised in said the digital videostream indicating an association between the three or more independentcomponent regions component video streams and the three or moreindependently decodable regions and configured to read a descriptorcomprised in said the digital video stream indicating a type of contentof each one of the three or more independently decodable regions,wherein said three or more independent component regions are originatedby a plurality of video sources and wherein any of said the three ormore independent component regions component video streams can beassociated with any of said the three or more independently decodableregions in an independent way,; a video data decoder configured todecode video data comprised in said the digital video stream accordingto a decoding strategy,; and a selecting unit configured to select fordecoding by said the video data decoder a set of said the three or moreindependently decodable regions indicated by said the signalling or bysaid the descriptor, wherein the video data decoder decodes said theselected set of the three or more independently decodable regionsselected by the selecting unit, and outputs a decoded digital videostream comprising said the selected set of selected the three or moreindependently decodable regions.
 18. The decoding device of claim 17,wherein said the selecting unit is further configured to automaticallyor manually select for display on a display device associated to saidthe decoding device said the selected set of the three or moreindependently decodable regions decoded by said the decoding device. 19.The decoding device according to claim 17, wherein said a the selectingunit is further configured to select, by means of a negotiation processwith a display device associated to said the decoding device, a displayformat comprising said the selected set of the three or moreindependently decodable regions decoded by said the decoding device.